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1.  INTRODUCTION 


1.1  Objectives 

The  main  objectives  of  this  two-year  study  are  described  in  Volume  I  of  this  annual 
report.  The  first  objective  is  to  assemble  data  sets  to  be  used  for  test  and  evaluation  of  the 
performance  of  neural  networks  for  automated  processing  and  interpretation  of  seismic 
data.  These  data  sets  are  provided  to  MIT  Lincoln  Laboratory  for  their  effort  on  the 
development  of  neural  networks  for  seismic  application  under  DARPA’s  Artificial  Neural 
Network  Technology  program. 

A  second  objective  of  this  study  is  to  evaluate  the  results  of  this  neural  network 
program  in  the  context  of  monitoring  nuclear  explosion  testing.  To  achieve  this  objective, 
we  will  test  and  evaluate  the  neural  network  application  developed  by  MIT  Lincoln 
Laboratory,  and  we  have  developed  our  own  neural  network  application  for  automated 
initial  identification  of  seismic  phases  {P  or  5)  using  data  recorded  by  3-component 
stations. 

1.2  Current  Status 

A  major  effort  during  the  first  year  of  this  project  was  spent  on  the  development  of 
data  sets  for  test  and  evaluation  of  neural  networks  for  seismic  signal  processing  and 
interpretation.  This  effon  is  described  in  detail  in  Volume  I  of  the  report  [Sereno  and 
Patnaik,  1991]. 

An  imponant  problem  in  automated  seismic  data  interpretation  is  initial  phase 
identification  (P  or  5)  using  data  recorded  by  3-component  stations.  We  developed  and 
tested  a  neural  network  approach  to  this  problem  using  data  recorded  by  the  3-component 
elements  of  the  array  stations  ARCESS,  NORESS,  FINESA  and  GERESS,  and  3- 
compjonent  stations  in  Poland  (KSP)  and  in  the  former  Soviet  Union  (GARM).  Since  the 
polarization  data  from  array  stations  are  averaged  during  IMS  processing,  we  also  applied 
our  method  separately  to  data  recorded  by  the  individual  3-component  elements  of 
ARCESS  and  NORESS.  The  neural  network  results  were  compared  to  results  from  a 
linear  multivariate  analysis  of  the  same  data,  and  adaptability  of  the  neural  networks  was 
examined  by  testing  them  with  data  from  stations  in  other  geological  environments.  We 
implemented  our  neural  network  software  into  a  test  version  of  ESAL  (Expert  System  for 
Association  and  Location),  which  is  a  knowledge-based  component  of  the  Intelligent 
Monitoring  System  (IMS).  The  integration  and  testing  of  the  first  version  of  the  module  is 
complete.  We  will  begin  testing  and  evaluating  its  performance  as  soon  as  data  from  the 
IRIS  stations  become  available.  The  next  version  of  this  software  will  accommodate 
“context”  as  input  to  the  trained  neural  network,  which  has  shown  improvement  in 
identification  by  3-5%,  compared  to  using  polarization  data  alone. 
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13  Outline  of  the  Report 

This  annual  report  is  divided  into  two  volumes.  Volume  I  is  a  description  of  Data 
Set  #1  that  was  provided  to  MIT  Lincoln  Laboratory  for  their  neural  network  application 
development  [Sereno  and  Patnaik,  1991].  Volume  II  (this  report)  describes  the  results  of 
our  neural  network  application  to  the  problem  of  initial  phase  identification,  using 
polarization  attributes  derived  from  data  recorded  by  3-component  stations.  Descriptions 
of  the  design,  development  and  testing  of  the  neural  networks  are  provided.  Preliminary 
results  of  comparative  evaluation  of  the  neural  networic  approach  with  the  multivariate 
discriminant  analysis  are  also  reported.  Finally,  this  report  describes  of  the  work  being 
done  on  the  implementation  of  the  developed  neural  network  module  into  the  IMS. 

Section  2  describes  the  data  used  for  neural  network  training  and  testing.  Section  3 
describes  the  neural  network  simulation.  Section  3.1  describes  why  we  use  the  neural 
network  approach.  Section  3.2  describes  back  propagation  neural  networks  and  the 
architecture  used  in  this  study.  Section  3.3  describes  the  methods  adopted  for  network 
parameter  estimation,  preprocessing  strategy  for  the  input  parameters,  and  the  methods  of 
training  and  testing.  Section  4  describes  our  results.  Section  4. 1  gives  results  of  each  3- 
component  element  of  the  arrays  NORESS  and  ARCESS.  Section  4.2  describes  the 
comparative  evaluation  of  the  results  with  those  obtained  from  a  multivariate  discriminant 
analysis  method.  Section  4.3  describes  the  adaptability  of  the  trained  neural  networks  to 
other  geological  conditions.  Section  4.4  describes  the  on-going  work  for  improving  the 
identification  performance  by  adding  context  in  the  simulation.  Section  5  describes  the 
on-going  work  on  the  integration  of  our  neural  networks  into  IMS.  Finally,  Section  6 
summarizes  the  study  of  neural  computing  for  initial  seismic  phase  identification. 
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2.  THE  DATA 


The  data  used  in  this  study  are  primarily  the  polarization  parameters  that  are 
routinely  computed  by  the  IMS  and  written  to  an  on-line  relational  database  at  CSS 
[Bache.  et.  ai,  1990].  These  data  are  associated  with  regional  P-type  and  S-type  phases 
that  have  been  identified  by  seismic  analysts  at  NORSAR  and  CSS.  The  analysts’ 
identifications  of  these  phases  are  used  as  ground-truths  for  neural  network  training.  We 
used  data  from  four  regional  arrays  (ARCESS,  NORESS,  FINESA  and  GERESS),  and 
two  3-component  stations  (KSP  and  GARM).  The  locations  of  these  stations  are  plotted  in 
Figure  1. 


LOCATION  OF  THE  ARRAYS  AND  SINGLE  STATIONS 


Figure  1.  The  location  of  the  high-frequency  arrays  ARCESS  (ARC),  FINESA  (FIN),  GERESS  (GER), 
NORESS  (NOR),  and  3-component  stations  KSP  and  GARM  are  shown. 
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The  method  used  in  IMS  for  particle-motion  analysis  was  developed  by  Jurkevics 
[1988].  It  computes  a  polarization  ellipsoid  within  overlapping  time  windows  by  solving 
the  eigenproblem  of  the  covariance  matrix.  Data  from  the  3-component  sensors  from  the 
arrays  are  combined  by  averaging  the  individual  covariance  matrices  before  solving  the 
eigenproblem.  The  covariance  matrices  are  computed  in  the  time  domain  for  several 
frequency  bands,  and  then  normalized  and  averaged  to  obtain  a  wide-band  estimate  for 
each  of  the  overlapping  windows.  The  IMS  implementation  of  this  is  described  by  Bache, 
et.  al.,  [1990], 

The  effects  of  signal-to-noise  ratio  {snr)  on  the  polarization  parameter  estimation 
have  been  studied  in  detail  by  Suteau-Henson  [1991].  Large  scatter  is  observed  for  data 
with  snr  <  2.0.  Therefore,  we  have  considered  two  separate  categories  of  data  (all  snr  and 
snr  >  2.0)  for  all  of  our  neural  network  simulations.  The  snr  is  the  ratio  of  the  maximum 
signal  3-component  amplitude  to  the  maximum  pre-arrival  noise  3-component  amplitude. 
The  3-component  amplitude  is  measured  from  the  time  window  with  maximum 
rectilinearity,  and  is  equal  to  the  sum  of  the  square  roots  of  the  eigenvalues  (i.e.,  it  is  the 
sum  of  the  amplitudes  measured  along  the  three  axes  of  the  polarization  ellipsoid). 

Several  of  the  particle  motion  attributes  are  calculated  from  the  time  window  with 
the  maximum  rectilinearity.  These  are  called  P-type  attributes  in  the  descriptions  below. 
Also,  several  attributes  are  calculated  from  the  time  window  with  maximum  3-component 
amplitude.  They  are  called  5-type  attributes.  The  polarization  attributes  used  in  this  study 
are  defined  as: 

freq:  Center  frequency  of  the  passbands  with  snr  >1.5  used  in  the  averaging.  The 

passbands  are  1-2,  2-4, 4-8  and  8-16  Hz.  For  example,  if  all  bands  had  snr 
>  1.5,  then  freq  would  be  8.5  Hz. 


rect: 


Signal  rectilinearity  defined  as: 


rect  = 


(^3  +  ^2^ 

2X^ 


where  X,j,  X2  ^3  ^  ^^e  eigenvalues  such  that  Xj  >  X2  >  X3.  rect  is  a  P- 
type  attribute. 


plans:  Signal  planarity  defined  as: 

plans  =  1  -  ^ 

Planarity  is  measured  from  the  window  with  the  maximum  3-component 
amplitude  (5-type  attribute). 
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hvrat:  Horizontal  to  vertical  power  ratio  defined  as: 

^3  ^  *"2 

hvrat  =  - - 

2c, 

where  Cj,  C2,  and  C3  are  the  diagonal  elements  of  the  covariance  matrix,  and 
Cj  corresponds  to  the  vertical  component,  hvrat  is  an  S-type  attribute. 

hvratp:  Similar  to  hvrat,  but  measured  at  the  time  of  maximum  rectilinearity.  It  is  a 

P-type  attribute. 

hmxmn:  Maximum  to  minimum  horizontal  amplitude  ratio  defined  as: 

hmxmn  = 

where  and  X2  are  the  maximum  and  minimum  eigenvalues  obtained  by 
solving  the  2-D  eigensystem,  using  only  the  horizontal  components.  It  is  an 
S-type  attribute. 

inangJ:  Incidence  angle  (measured  from  the  vertical)  of  the  eigenvector  associated 

with  the  smallest  eigenvalue.  It  is  also  called  the  short-axis  incidence  angle 
and  is  an  S-type  attribute. 

inangl  :  Apparent  incidence  angle  (measured  from  the  vertical)  of  the  eigenvector 

associated  with  the  largest  eigenvalue.  It  is  also  called  the  long-axis  inci¬ 
dence  angle,  or  the  emergence  angle  and  is  a  P-type  attribute. 

Figures  2-9  show  histograms  of  these  attributes  for  P-type  and  S-type  phases 
recorded  by  each  of  the  stations  mentioned  previously.  The  number  of  P-type  and  S-type 
phases  that  are  used  for  each  station  are  shown  in  parentheses  and  range  from  a  few 
hundred  to  several  thousand.  In  addition  to  noticeable  station-dependence  in  these  data, 
(e.g.,  P-rectilinearity  at  NORESS),  these  histograms  show  considerable  overlap  for  P-type 
and  S-type  phases.  This  is  in  contrast  to  the  array  measurement  of  phase  velocity  (Figure 
10).  Accurate  estimates  of  this  single  parameter  enable  near  perfect  identification  of  P- 
type  and  S-type  phases  for  array  data.  Since  this  parameter  is  not  available  for  single  3- 
component  data,  automated  phase  identification  is  performed  from  the  polarization 
attributes.  Neural  network  classifiers  are  well-suited  for  this  type  of  situation  since  they 
are  capable  of  constructing  non-linear  decision  surfaces  across  complex  class  boundaries 
from  high-dimensional  input  data. 
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Figure  4.  Histograms  of  "short-axis  incidence  angle"  are  shown  for  P-type  phases  (upper)  and  S-type  phases  (lower).  The  numbers  in 
parentheses  indicate  the  number  of  associated  phases.  The  P  and  S  populations  show  overlaps  at  ail  observing  stations. 
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Figure  S.  Histograms  of  “logarithm  of  the  ratio  of  horizontal  to  vertical  power"  are  shown  for  P-type  phases  (upper)  and  S-type  phases  (lower). 
The  numbers  in  parentheses  indicate  the  number  of  associated  phases.  The  P  and  S  populations  show  overlaps  at  all  observing  stations. 
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ARCESS 


2  4  6  8  10  ’2 

Phas«  Veioaty  (km/s) 


Figure  10,  Phase  velocity  is  plotted  for  P  and  S  phases  recorded  at  ARCESS.  Phase  velocity  is  estimated 
using  a  wide-band  frequency-wavenumber  (f-k)  algorithm  [Kvaerna  and  Doornbos,  1986J.  This  calculation 
is  done  using  all  available  vertical  channels  (up  to  25  array  elements).  Note  that  a  phase  velocity  of  6  km/s 
almost  perfectly  separates  P-type  and  S-type  phases. 
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3.  NEURAL  NETWORK  SIMULATION 


In  this  section  we  describe  the  design  and  implementation  of  our  neural  networks 
for  automated  initial  identification  of  seismic  phases  recorded  by  3-component  stations. 
The  goal  is  to  identify  the  phase  type  (P  or  S)  based  on  the  polarization  attributes  described 
in  the  previous  section. 

3.1  Why  Neural  Networks? 

There  are  several  techniques  that  can  be  used  for  automated  initial  identification  of 
seismic  phases  using  polarization  attributes.  The  current  rule-based  system  of  IMS  has 
explicit  rules  (knowledge  sources)  for  this  task.  However,  it  is  difficult  to  develop  rules 
for  tasks  that  use  multivariate  data  (8-10  polarization  attributes).  In  addition,  polarization 
characteristics  are  site-specific,  so  a  new  set  of  rules  must  be  developed  each  time  a  new 
station  is  added  to  the  seismic  network.  Multivariate  statistical  techniques  are  applicable 
in  this  situation  [Suteau-Henson,  1991].  However,  the  required  assumption  of  normality 
of  the  data  and  a  linear  method  renders  it  sensitive  to  outliers  and  noise,  particularly  for 
low  snr.  The  neural  networks  used  in  our  study  do  not  require  the  normality  assumption 
and  are  less  sensitive  to  outliers.  These  networks  offer  a  data-intensive,  case-based 
approach  to  the  problem.  The  functional  relation  between  the  polarization  attributes  and 
the  corresponding  phase-type  is  derived  as  a  network  of  nodes  and  weights  connecting 
these  nodes.  Also,  neural  networks  are  amenable  to  machine-learning  techniques  and  are 
easily  adapted  to  data  from  new  stations. 

There  have  been  successful  applications  of  this  technique  in  seismological 
problems  [Patnaik,  1989;  Patnaik,  et.  al.,  1990;  Patnaikand  Mitchell,  1990;  Dysart  and 
Pulli,  1990;  and  Dow/a,  et.  al.,  1990].  In  the  next  section  we  briefly  describe  the  particular 
type  of  neural  network  used  in  our  study. 
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3.2  Neural  Networks  with  Back  Propagation  Training 

The  neural  networic  architecture  that  we  used  has  three  layers:  eight  input  nodes, 
four  middle  (hidden)  nodes,  and  two  output  nodes  (Figure  1 1 )  The  input  layer  with  eight 
nodes  corresponds  to  the  eight  polarization  attributes,  and  two  output  nodes  correspond  to 
P-type  and  5-type  phases.  All  of  the  networks  have  four  middle-layer  nodes.  The  number 
of  these  nodes  was  determined  empirically  as  described  in  Section  3.3.2.  The  inputs  to 
each  node  in  the  middle  layer  are  weighted  sums  of  the  polarization  attributes,  and  the 
output  of  a  node  is  calculated  by  applying  a  non-linear  thresholding  function  to  its  input 
(Figure  12).  These  nodes  act  as  thresholding  units;  the  thresholding  function  suppresses 
the  outputs  to  between  0  and  1.  Determination  of  the  appropriate  weights  among  the 
nodes  constitutes  network  training  or  learning.  Tire  weights  w,y  converging  to  a  node  Oj 
may  be  thought  of  as  the  coefficients  of  an  equation  representing  an  [/-7]-dimensional 
plane.  Each  of  the  nodes  Oj,  with  their  weight  w^,  thus  partition  the  input  space  (training 
samples)  into  segments  txrunded  by  hyper-planes.  These  segmented  regions  each 
represent  a  class  (sub-class)  of  the  data.  During  training  the  positions  of  these  hyper¬ 
planes  change.  The  training  is  based  on  applications  to  signals  with  known  output 
classifications.  For  network  training,  we  employ  a  variation  of  the  back -error  propagation 
algorithm  described  by  Rwnelhart  and  McClelland  [1986]. 


P  s 


Fraq  Baet  Plan  Anql  AnqS  Htnitnn  Hvrtlp  Hvfit 


Figure  II.  A  simple  3-layer,  feed-forward  neural  network  with  eight  input  nodes  (a^j.four  middle  (hidden) 
nodes  (Oj),  and  two  output  nodes  (a/J  W,j  are  the  weights  from  input  to  middle  layer  and  are  the 
weights  from  middle  to  output  layer. 
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Figure  12.  Non-linear  thresholding  function  used  at  the  middle  and  output  layer  nodes. 


The  training  is  accomplished  by  minimizing  the  sum  square  error,  E,  measured  at  the  out¬ 
put  units.  This  error  is 


defined  as: 

E  =  I(Ok-‘k>^ 

k 

where  Oi,  is  the  output  produced  at  node  “k”  by  propagating  input  patterns  [aj]  through  the 
network.  The  term  t^  is  the  desired  output  of  node  “k”,  which  is  the  teaching  signal.  The 
vector  [a,]  is  the  vector  of  polarization  parameters  and  t|j  is  either  1  or  0  depending  on 
whether  (a,)  corresponds  to  a  P-type  or  5-type  phase. 
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An  output  Oj  can  be  represented  as: 


1 


1  +  e  ‘ 

The  error  term  is  propagated  back  to  the  middle  layer  nodes  using  the  generalized 
delta  rule  [Rumelhart  and  McClelland,  1986]  which  applies  weight  optimization  by  the 
gradient  descent  method.  Two  parameters  called  learning  rate  and  momentum  constant 
that  are  used  in  this  algorithm  are  adjusted  by  trial  and  error  during  training.  This  process 
was  found  to  be  much  slower  than  a  conjugate-gradient  optimization  technique,  which  we 
have  used  for  all  of  our  network  training.  The  latter  technique  also  obviates  the  need  for 
the  earlier  mentioned  heuristic  parameters  and  is  much  faster. 

The  term  Xj,  shown  in  the  expression  for  a  node  output,  represents  a  bias  node.  As 
shown  in  Figure  11,  this  bias  node  produces  four  weights  connecting  to  the  four  middle 
layer  nodes.  These  weights  offer  translation  to  the  dynamics  of  a  network.  What  it  means 
for  a  trained  network  application  is  that  a  bias,  which  is  dependent  on  site-specific 
observed  polarization  patterns  and  the  number  of  occurrences  of  such  patterns,  is  built  into 
the  network  weights. 

3.3  The  Method 

As  described  in  the  previous  sections,  we  use  neural  networks  as  pattern  matchers. 
For  our  purpose,  the  vector  of  polarization  parameters  constitutes  a  pattern  corresponding 
to  a  given  phase.  The  ground-truths  are  analyst- verified  phase  identifications  and  are 
given  as  teaching  inputs.  The  neural  network  parameters  are  problem-dependent  Oike  the 
number  of  nodes  in  the  middle  layer)  and  were  estimated  empirically  as  described  later  in 
this  section. 


3.3.1  Data  Processing 

As  mentioned  in  Section  2,  the  input  data  for  our  neural  network  training  were 
derived  from  the  polarization  processing  of  IMS.  The  eight  polarization  attributes 
described  in  Section  2  were  selected  for  the  available  associated  P-type  and  S-type  phases 
recorded  at  each  station.  For  these  measured  attributes,  the  value  of freq  ranges  from  1  Hz 
to  12  Hz;  the  incidence  angles  inangl  and  inangS  range  between  0*  and  90’;  rect  and  plans 
range  between  0.0  and  1.0;  and  the  amplitude  and  power  ratio  parameters  hmxmn,  hvratp 
and  hvrat  range  from  0  to  approximately  10.  In  order  to  keep  the  weights  and  weight 
changes  small,  the  usual  convention  is  to  scale  the  input  parameters  to  small  numeric 
values,  near  ±1.  We  tried  several  preprocessing  strategies  to  achieve  this.  The  best 
performance  was  obtained  by  replacing /re^  with  Iffreq,  dividing  inangl  and  inangS  by 
90,  and  compressing  the  amplitude  parameters  by  taking  their  natural  logarithm. 
Therefore,  we  applied  this  preprocessing  strategy  to  the  inputs  for  all  of  our  neural 
networks. 
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3.3.2  Architecture 


We  conducted  numerous  experiments  to  choose  the  optimum  network  parameters. 
These  experiments  involved  adjusting  the  network  learning  rate,  the  number  of  nodes  in 
the  middle  layer,  and  the  choice  of  polarization  attributes  using  the  3-component  data 
recorded  at  ARCESS.  For  example,  our  method  for  selecting  the  number  of  nodes  in  the 
middle  layer  is  illustrated  in  Figure  13.  This  shows  the  percentage  of  identification 
accuracy  versus  the  number  of  nodes  for  P-type  and  S-type  phases  recorded  at  ARCESS. 
As  shown  in  Figure  13,  networks  with  more  than  4-5  nodes  in  the  middle  layer  increase 
complexity  without  improving  identification  accuracy.  Therefore,  we  implemented  four 
middle  layer  nodes  in  all  of  our  networks.  Similarly,  several  combinations  of  polarization 
attributes  were  used  as  input  patterns  in  order  to  identify  the  most  significant  attributes 
(e.g.,  varying  number  of  input  nodes).  The  identification  accuracy  is  close  to  85%  for  all 
snr  ARCESS  data  using  four  parameters  {rect,  inangl,  hvrat  and  inangS).  By  adding  the 
rest  of  the  polarization  attributes,  this  accuracy  increased  by  5-7%  without  increasing  the 
training  time  significantly.  Therefore,  we  used  all  eight  polarization  attributes  as  input  to 
our  simulations. 


EMPIRICALLY  ESTIMATED  MIDDLE-LAYER  NODES 


no.  of  nodes 


Figure  13.  Percentage  identification  accuracy  versus  the  number  of  nodes  in  the  middle  layer.  This 
example  is  for  P-type  and  S-type  phases  with  snr  >  2.0  recorded  at  APCES5.  The  networks  are  of  the  form 
8-X-2,  where  8  is  the  number  of  inputs,  X  is  the  variable  number  of  nodes  in  the  middle  layer,  and  2  is  the 
number  of  output  nodes  (P  or  S). 


19 


Figure  14  shows  the  schematic  3-layer  architecture  for  the  resulting  network  for 
the  station  GARM.  The  final  weight  configurations  (two  weight  matrices)  are  derived  by 
using  the  method  described  in  Section  3.2.  As  show."  m  Figure  14,  the  higher  activation  of 
the  F-output  node  implies  that  the  set  of  polarization  attributes  identified  the  associated 
phase  as  a  F-type  phase.  A  node  activation  value  of  0.5  would  represent  an  indeterminate 
case  (see  Section  3.3.4). 


3.3,3  Network  Traiiting 

Our  results  of  identification  accuracy  are  based  on  training  the  networks  with  2/3 
of  the  data,  and  evaluating  the  performance  (testing)  on  the  remaining  1/3.  Stability  is 
established  by  applying  this  test  three  times,  each  time  using  a  different  1/3  of  the  data  for 
testing  for  each  station.  The  results  are  reported  as  the  average  of  the  three  tests,  since  no 
appreciable  differences  among  the  results  for  different  test  sets  were  noticed. 

Training  a  typical  neural  network  required  approximately  500  presentations 
(forward  propagation,  backward  propagation  and  weight  adjustment)  of  about  2,000 
sample  patterns  and  took  less  than  one  hour  on  a  SUN-4  Sparc  station.  Of  course,  the 
training  time  varies  with  the  sample  size  when  all  other  network  parameters  remain  the 
same. 
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SCHEMATIC  3-LAYER  “TRAINED”  NETWORK 
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Figure  14.  i^chematic  diagram  of  a  trained,  3-layer,  feed-forward,  neural  network  {8-4-2).  The  input  nodes  are  a^  (polarization  attributes); 
w,j  are  the  weights  from  input  to  middle  (hidden)  layer;  aj  are  the  middle  layer  nodes;  wji;  are  the  weights  from  middle  to  output  layer;  and 
ai^  are  the  output  nodes  (P  or  S).  This  particular  example  shows  the  identification  of  a  P-type  phase  at  GARM. 


3.3.4  Confidence  Factors 

We  estimated  an  empirical  confidence  measure  for  the  phase  identifications 
determined  by  the  neural  networks  (Figure  15)  by  companng  the  output  activations  of 
each  node  to  the  true  phase  (ground-truth).  As  shown  in  Figure  15,  an  output  activation 
higher  than  0.65  corresponds  to  greater  than  90%  confidence  in  the  neural  network  phase 
identification  for  both  phases  at  ARCESS  and  for  5-type  phases  at  NORESS.  The  lower 
confidence  obtained  for  P-type  phases  at  NORESS  is  perhaps  explained  by  the  scattering 
effect  introduced  by  the  heterogeneities  beneath  the  array,  causing  polarization  parameters 
to  be  more  irregular.  This  is  also  noticeable  from  the  histogram  distribution,  as  shown  in 
Figure  2.  There  are  more  rigorous  methods  of  the  estimation  of  probability  of  a  phase 
identification  from  the  outputs  of  the  neural  networks,  but  we  have  not  implemented  them 
in  the  current  version. 

EMPIRICALLY-ESTIMATED  CONFIDENCE  FACTORS 


node  activation  value 


Figure  15.  Empirically-estimated  confidence  factors  for  ARCESS  (two  inner  curves)  and  similarly  for 
NORESS.  The  solid  curves  are  for  P-type  phases  and  the  dashed  curves  are  for  S-type  phases. 


4.  RESULTS 


The  percentage  of  correct  identification  for  ARCESS  and  NORESS  was  92-99% 
for  data  with  3-component  snr  >  2.0  and  86-96%  for  all  snr.  However,  this  includes  the 
reduction  in  variance  caused  by  averaging  the  four  3-component  elements  in  these  arrays. 
The  percentage  of  correct  identification  for  each  individual  3-component  station  in  these 
arrays  is  somewhat  lower,  as  described  in  the  next  section. 

4.1  Single  3-Component  Elements  of  NORESS  and  ARCESS 

To  examine  the  effect  of  array  averaging,  which  reduces  variance  in  the 
polarization  measurements,  we  conducted  similar  network  simulations  with  data  from 
each  of  the  3-component  elements  of  the  arrays  ARCESS  and  NORESS.  The  results  are 
shown  in  Table  1.  These  results  show  that  there  are  small  variations  in  identification 
accuracy  among  data  from  the  individual  elements.  However,  there  is  about  an  8% 
difference  between  the  results  for  all  snr  and  for  snr  >  2.0. 


TABLE  1.  SINGLE  3-COMPONENT  SITES  OF  ARRAYS 
Average  Percentages  of  Correct  Identification  of  Both  P-type  and  5-i  \  pe  Phases 


ARCESS 

ARAO 

ARC7 

ALL  SNR 

85.2 

83.3 

81.0 

81.3 

SNR  >2 

92.4 

92.0 

87.7 

89.5 

NORESS 

NfUO 

. . 

NRC2 

iyiipili 

ALL  SNR 

80.5 

76.4 

79.3 

79.8 

SNR  >2 

92.3 

89.8 

92.4 

90.1 
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4^  Comparative  Evaluation 

Another  objective  of  the  DARPA  neural  network  program  is  to  evaluate  the 
performance  of  this  technique  compared  to  existing  techniques.  We  compared  the  neural 
network  results  obtained  for  ARCESS  and  NORESS  data  to  those  obtained  using  a 
multivariate  discriminant  approach  on  a  common  data  set.  The  multivariate  analysis  is 
being  performed  by  Drs.  Anne  Suteau-Henson  and  Jerry  Carter  at  CSS.  Preliminary 
results  show  that  the  identification  accuracy  obtained  by  neural  networks  is  3-7%  higher 
than  those  obtained  by  the  multivariate  statistical  approach  (Table  2).  There  are  some 
discrepancies  in  the  data  set  that  was  used,  which  may  reduce  this  difference. 
Nevertheless,  the  improvements  obtained  by  the  neural  networks  were  greater  for  S-type 
phases  than  they  were  for  P-type  phases.  We  are  currently  examining  the  attributes  of  the 
phases  that  were  not  correctly  identified  by  either  method  to  see  if  there  are  consistencies 
among  them  that  could  be  used  to  improve  the  overall  performance. 


TABLE  2.  COMPARATIVE  PERFORMANCE 
Percentages  of  Correct  Identification 


ARCESS 

ARCESS 
(SNR  >  2) 

NORESS 

NORESS 
(SNR  >  2) 

P 

S 

Q. 

S 

P 

S 

■a 

S 

Neural 

Network 

8d.C 

95.5 

92.5 

98.5 

86.0 

96.0 

m 

99.0 

*  Multi-variate 
Discriminant 
Analysis 

86.5 

88.5 

90.7 

92.6 

86.2 

89.3 

94.0 

96.0 

‘[Performed  by  A.  Henson  and  J.  Carter] 


4.3  Adaptability 

One  of  the  goals  of  this  program  is  to  examine  the  adaptability  (and  generality)  of 
the  trained  neural  networks  to  data  from  differing  geologic  environments.  We  initially 
tested  the  generalization  capability  of  trained  neural  networks  and  their  adaptability  to 
data  from  a  new  site  by  applying  them  to  data  recorded  by  one  of  the  IRIS  stations 
(GARM)  in  the  former  Soviet  Union.  We  found  that  networks  that  were  trained  with 
NORESS/ARCESS  data  performed  at  about  80%  accuracy  level  when  tested  dirccdy  with 
data  recorded  by  GARM,  without  retraining.  The  identification  accuracy  increased  by 
about  10%  after  retraining,  using  data  recorded  at  GARM. 

Similar  experiments  were  conducted  for  all  the  available  stations  in  order  to 
introduce  greater  variability  in  the  geologic  conditions  of  our  tests.  Table  3  shows  the 
results  of  these  tests.  The  polarization  data  used  for  these  tests  have  3-component  snr  > 
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2.0.  In  order  to  have  a  comparable  estimate,  we  chose  data  from  only  one  3-component 
element  of  the  arrays,  ARCESS  and  NORESS.  As  shown  in  Table  3,  the  diagonals  show 
the  training  and  testing  with  data  from  the  same  station.  The  off-diagonals  show  the 
results  of  cross-testing  (i.e.,  adaptability  testing).  It  is  observed  that  the  identification 
accuracy  is  about  10-15%  higher  if  testing  and  training  use  data  from  the  same  station.  A 
trained  network  generally  shows  about  80%  correct  identification  of  phases  if  applied  to 
data  from  a  new  site.  Thus,  the  propagation  characteristics  are  similar  for  all  geological 
environments  tested,  to  the  extent  that  80%  of  the  detections  have  similar  polarization 
characteristics.  The  rest  of  the  increase  by  10-15%  upon  retraining  may  be  attributed  to 
the  site-specific  characteristics  of  the  different  regions. 


TABLE  3.  ADAPTABILITY 

Average  Percentages  of  Correct  Identification  of  Both  P-type  and  S-type  Phases  (snr  >  2) 

Jeitl 


ARCESS 

NORESS 

FINESA 

GERESS 

KSP 

GARM 

92.35 

-,,87.69 

82.11 

89.71 

77.74 

89.09 

90,69 

.,92.73 

-,,,76.69 

85.75 

79.12 

80.53 

89.80 

84.2^ 

-„9f79' 

-,86.69 

87.20 

87.10 

90.36 

87.27 

83.17 

.9.i.b6" 

-,77.05 

88.31 

83.80 

84.97 

81.81 

78.77 

-9259  -  -80.71 

59.33 

68.60 

72.22 

80.54 

70.24' 

93.^} 

ARCESS 


NORESS 


FINESA 


GERESS 


KSP 


GARM 


4.4  Adding  Context 

The  polarization  attributes  that  were  used  for  neural  network  phase  identification 
did  not  have  any  contextual  information,  such  as  the  information  about  relative  detection 
time  of  the  corresponding  phases.  Therefore,  as  a  next  step  we  augmented  the  polarization 
data  with  “context”  in  an  effon  to  improve  identification  accuracy.  So  far  we  have 
considered  two  such  contexts.  One  of  these  is  the  difference  between  the  number  of 
detections  that  arrive  before  the  detection  in  question  and  the  number  of  detections 
following  it  for  a  fixed  time  window.  An  example  of  the  distribution  of  this  parameter  is 
shown  in  Figure  16  for  the  arrivals  at  the  station  KSP.  The  figure  also  shows  another 
contextual  parameter  obtained  from  the  mean  time  differences  between  the  detection  in 
question  and  detections  before  and  after  it  within  a  fixed  time  window.  These  contextual 
parameters  show  better  separations  than  many  of  the  polarization  attributes  (Figures  2  -  9). 
When  these  are  added  to  the  polarization  parameters  in  separate  simulations,  the 
percentage  of  correct  identification  of  phases  observed  at  KSP  increased  by  3-5%.  The 
window  length  a.sed  in  the  contextual  parameter  was  chosen  empirically,  and  is  governed 
by  the  nature  o  li  seismicity  observed  at  a  given  station. 


25 


CONTEXTUAL  PARAMETERS  (30  SECOND  WINDOW) 
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Figure  16.  The  histograms  on  the  left  show  the  difference  between  the  number  of  detections  that  arrive 
before  the  detection  in  question  and  the  number  of  detections  following  it,  for  a  fixed  window  length  of  30 
seconds.  Similarly,  the  histograms  on  the  right  show  the  differences  between  the  mean  arrival  times  before 
and  after  the  detection  in  question  within  a  fixed  time  window. 
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5.  INTEGRATION  INTO  IMS 


We  are  currently  replacing  the  rule-based  initial  phase  identification  in  IMS  with 
our  neural  network  approach.  We  have  implemented  the  neural  network  module  for  initial 
phase  identification  into  a  test  version  of  ESAL,  which  is  a  knowledge-based  system 
component  of  IMS.  This  initial  implementation  will  allow  us  to  choose  between  the 
neural  network  and  the  rule-based  methods  so  that  we  can  apply  both  to  the  same  data 
(Figure  17).  This  will  provide  a  basis  for  a  direct  comparison  of  the  two  methods  under 
operational  C'^'^-iitions.  We  will  test  this  performance  using  3-component  data  recorded  by 
the  IRIS  stations  in  the  former  Soviet  Union. 


Waveform 

data 


Initial  Phase  Identification 


Figurel7.  System  Integration.  This  diagram  shows  the  iniegralion  of  the  neural  network  initial  phase 
identification  module  into  the  rule-based  component  (Expert  System  for  Association  and  Location)  of  the 
IMS  system.  The  initial  phase  identification  element  of  the  expert  system  will  be  replaced  by  a  trained  neural 
network. 
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6.  SUMMARY 


We  have  developed  and  implemented  a  neural  netwod.  technique  for  initial  phase 
identification  using  polarization  measurements  from  3-component  data.  This  technique 
has  the  following  advantages; 

•  It  is  easier  to  develop  than  rules  because  phase  identification  is  based  on  high¬ 
dimensional  multivariate  input  data. 

•  It  incorporates  station-specific  characteristics. 

•  It  performs  3-7%  better  than  a  linear  multivariate  discriminant  analysis  method 
(particularly  for  data  with  low  snr). 

•  It  is  easily  adapted  to  data  from  new  stations.  For  example,  we  find  that  we 
achieve  75-80%  identification  accuracy  for  a  new  station  without  system  retraining 
(e.g.,  using  a  network  derived  from  data  from  a  different  station).  The  data 
required  for  retraining  can  be  accumulated  in  about  two  weeks  of  continuous  oper¬ 
ation  of  the  new  station,  and  training  takes  less  than  one  hour  on  a  Sun4  Sparc  sta¬ 
tion.  After  this  retraining,  the  identification  accuracy  increases  to  >  90%. 

These  neural  networks  are  being  implemented  into  DARPA’s  Intelligent  Monitoring  Sys¬ 
tem  which  is  in  operation  at  the  Center  for  Seismic  Studies. 
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